Data visualization best practices

GEOG 30323

October 24, 2017

Data visualization

  • Thus far: we’ve learned how to use data visualization to explore our data
  • In the weeks to come:
    • Best practices in data visualization
    • Advanced chart types
    • Interactive visualization
    • Geographic visualization (maps!)
    • Putting it all together!
Source: Wikimedia Commons
Source: Wikimedia Commons
Source: Nathan Yau/FlowingData

Anscombe’s Quartet

Source: Wikimedia Commons

Considerations when visualizing data

  • What are you visualizing?
  • Who is your audience?
  • In what format will you be presenting the visualization?

Visual variables

Source: Data Points

Color

  • Hue: color, commonly understood (red, blue, green)
  • Lightness or Value: extent to which color is light or dark
  • Saturation: vividness of the color

Color schemes

Source: Data Points

Color and context

Source: FiveThirtyEight.com

Color-blindness

SBNation.com

Good use of color

Source: Kirk Goldsberry/Grantland

Poor use of color

Source: Jonathan Cohn via Kenneth Field/Cartonerd

Color and visual variables

Examples

Let’s fetch some data:

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from pandas_datareader import wb

eu_countries = ['BE', 'BG', 'CZ', 'DK', 'DE', 'EE', 'IE', 'GR', 'ES', 'FR', 'HR', 
               'IT', 'CY', 'LV', 'LT', 'LU', 'HU', 'MT', 'NL', 'AT', 'PL', 'PT', 
               'RO', 'SI', 'SK', 'FI', 'SE', 'GB']
               
ue = wb.download(indicator = "SL.UEM.TOTL.ZS", 
                 country = eu_countries, start = 1991, 
                 end = 2014)

ue.reset_index(inplace = True)

ue.columns = ['country', 'year', 'unemployment']

The ‘heat map’

Source: The Wall Street Journal

Heat maps in seaborn

  • Available in seaborn’s heatmap() function; takes a wide data frame with x-values in the index and y-values as column headers
ue_wide = ue.pivot(index = 'country', columns = 'year', 
                   values = 'unemployment')

sns.heatmap(ue_wide)

The seaborn ‘heat map’

Color palettes in seaborn

  • ColorBrewer: popular color schemes for visualization
  • Support for ColorBrewer built into seaborn
  • See more at http://colorbrewer2.org/

Color in seaborn

  • Color palettes, available in the color_palette() function, can be viewed with the palplot() function
sns.palplot(sns.color_palette('Greens', 7))

  • Colors can be reversed by adding _r:
sns.palplot(sns.color_palette('Greens_r', 7))

Color in seaborn

  • color_palette() also allows for the creation of custom palettes!
colors = ['#F5A422', '#3E22F5', '#3BF522', 
          '#C722F5', '#F53E22']

pal = sns.color_palette(colors)

sns.palplot(pal)

Color in seaborn

mx = pd.read_csv('http://personal.tcu.edu/kylewalker/mexico.csv')
sns.barplot(x = 'gdp08', y = 'name', 
            data = mx.sort_values('gdp08', ascending = False), 
            palette = "Greens_r")

Highlighting and annotation

Source: Data Points

The “spaghetti” chart

Highlighting

Highlighting code

sns.set_style('white')

ue['year2'] = ue.year.astype(float)

full = ue.pivot(index = 'year2', columns = 'country', values = 'unemployment')

greece = full['Greece']

full.plot(legend = False, style = 'lightgrey')
greece.plot(style = 'blue', legend = True)

Annotation in Python

Annotation code

plt.annotate('Global recession \nspreads to Europe', xy = (2009, 9.5), 
             xycoords = 'data', xytext = (2005, 15), textcoords = 'data', 
             arrowprops = dict(arrowstyle = 'simple', color = '#000000'))

Small multiples

Source: Data Points

Small multiples in pandas

full.plot(subplots = True, layout = (7, 4), 
          figsize = (12, 10), sharey = True)

Faceted plots in seaborn

  • Plotting functions in seaborn can be “faceted” with factorplot or mapped onto a faceted plot with FacetGrid

Modifying chart options

  • .plot() in pandas and seaborn are wrappers around matplotlib, the main plotting engine for Python
  • In turn, all matplotlib customization methods are available for your pandas and seaborn plots - and there are many!
  • To get access: import matplotlib.pyplot as plt

Formatting axes & labels

  • Example:
plt.figure(figsize = (10, 7))

sns.heatmap(ue_wide, cmap = 'YlGnBu')

plt.ylabel("")
plt.xlabel("")
plt.title("Unemployment in Europe, 1991-2013")
plt.xticks(rotation = 45)

seaborn and matplotlib

  • seaborn returns a matplotlib object that can be modified by the options in the pyplot module
  • Often, these options are wrapped by seaborn and .plot() in pandas and available as arguments - so check the documentation to see what you can do!

Putting it all together

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_style('white')

ue['year2'] = ue.year.astype(float)

full = ue.pivot(index = 'year2', columns = 'country', 
                values = 'unemployment')

greece = full['Greece']

full.plot(style = 'lightgrey', legend = False, figsize = (10, 7))
greece.plot(style = 'blue', legend = True)

plt.xlabel("")
plt.ylabel("Unemployment rate")
plt.annotate('Global recession \nspreads to Europe', xy = (2009, 9.5), 
             xycoords = 'data', xytext = (2005, 15), textcoords = 'data', 
             arrowprops = dict(arrowstyle = 'simple', color = '#000000'))
plt.yticks(range(0, 31, 5), [str(x) + '%' for x in range(0, 31, 5)])

Logarithmic scales

  • Modification of scale (generally, \(10^{n}\)) to better show trends
mx.plot(x = 'mus09', y = 'gdp08', kind = 'scatter', logy = True)

Scatterplot smoothing

  • Local regression or LOESS used to produce smooth curves through data
sns.lmplot(data = mx, x = 'mus09', y = 'pri10', lowess = True)

Scatterplot matrices

sns.pairplot(data = mx, vars = ['gdp08', 'mus09', 'pri10'])

Image resolution

  • Higher resolution: greater detail in an image
  • Commonly: dpi (dots per inch)
Source: Wikimedia Commons

Exporting your visualizations

  • To save your visualizations from the Jupyter Notebook:
plt.savefig('destfile.jpg', dpi = 300)